Skip to content

Conversation

@OuadiElfarouki
Copy link
Contributor

@OuadiElfarouki OuadiElfarouki commented Oct 1, 2024

We add Backend and Device registry as introduced in #9707 for SYCL Backend. This re-enables test-backend-ops for SYCL among others.

This patch also implements most of the event APIs for the SYCL backend, fixes the set_tensor_async and enables an async IO / H2D memory copies for model loading (similar to CUDA backend implementation).
Some improvement figures (load time) :

  • Nvidia A100 40GB + LLaMa 3.1 70B Q4 : 27.6s (master) -> 5.8s (patch)

  • Intel Arc A770 + LLaMa 3.1 8B Q4 : 1.6s (master) -> 0.8s (patch)

  • I have read the contributing guidelines

  • Self-reported review complexity:

    • Low
    • Medium
    • High

Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot duplicate the async loading code in llama.cpp for each backend. In the next days I will make a PR that will make changes that will allow this code to work with any backend that implements the necessary ggml-backend interfaces (#9707).

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Oct 1, 2024
@OuadiElfarouki
Copy link
Contributor Author

Fair enough @slaren thanks for the hint. Will draft the current PR until then.

@OuadiElfarouki OuadiElfarouki marked this pull request as draft October 1, 2024 22:37
@Alcpz
Copy link
Contributor

Alcpz commented Oct 4, 2024

@OuadiElfarouki #9707 got merged 🎉

@OuadiElfarouki OuadiElfarouki marked this pull request as ready for review October 8, 2024 16:06
@OuadiElfarouki OuadiElfarouki changed the title [SYCL] Implementing async model loading for non mapped memory [SYCL] Add SYCL Backend registry, device and Event Interfaces Oct 8, 2024
Copy link
Contributor

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions and (possibly) a dumb question wrt the use of events.

@OuadiElfarouki OuadiElfarouki merged commit 87421a2 into ggml-org:master Oct 18, 2024
53 checks passed
drollings pushed a commit to drollings/llama.cpp that referenced this pull request Oct 18, 2024
…rg#9705)

* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
…rg#9705)

* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…rg#9705)

* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
test passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…rg#9705)

* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants